Collecting h2o
  Downloading h2o-3.32.1.5.tar.gz (164.8 MB)
     |████████████████████████████████| 164.8 MB 2.0 kB/s 
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from h2o) (2.23.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.7/dist-packages (from h2o) (0.8.9)
Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from h2o) (0.16.0)
Collecting colorama>=0.3.8
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->h2o) (2021.5.30)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->h2o) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->h2o) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->h2o) (3.0.4)
Building wheels for collected packages: h2o
  Building wheel for h2o (setup.py) ... done
  Created wheel for h2o: filename=h2o-3.32.1.5-py2.py3-none-any.whl size=164886106 sha256=9c17f11aeae449c0f5edc6a4daef60eec58bb800d668f382cc12dbe1de357a68
  Stored in directory: /root/.cache/pip/wheels/2f/f4/f6/7115a720596f0b6c377b3d82c28242585c7bb7ab27d430f97c
Successfully built h2o
Installing collected packages: colorama, h2o
Successfully installed colorama-0.4.4 h2o-3.32.1.5
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.11" 2021-04-20; OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04); OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
  Starting server from /usr/local/lib/python3.7/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpx7kye1gn
  JVM stdout: /tmp/tmpx7kye1gn/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpx7kye1gn/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O_cluster_uptime: 03 secs
H2O_cluster_timezone: Etc/UTC
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.32.1.4
H2O_cluster_version_age: 11 days
H2O_cluster_name: H2O_from_python_unknownUser_0tvhwr
H2O_cluster_total_nodes: 1
H2O_cluster_free_memory: 3.172 Gb
H2O_cluster_total_cores: 2
H2O_cluster_allowed_cores: 2
H2O_cluster_status: accepting new members, healthy
H2O_connection_url: http://127.0.0.1:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.7.11 final
Parse progress: |█████████████████████████████████████████████████████████| 100%
AutoML progress: |████████████████████████████████████████████████████████| 100%

Leaderboard

Leaderboard shows models with their metrics. When provided with H2OAutoML object, the leaderboard shows 5-fold cross-validated metrics by default (depending on the H2OAutoML settings), otherwise it shows metrics computed on the frame. At most 20 models are shown by default.
model_id auc logloss aucpr mean_per_class_error rmse mse training_time_ms predict_time_per_row_msalgo
GBM_grid__1_AutoML_20210720_110510_model_1 0.79592 0.2544770.219521 0.2739970.2717190.073831 341 0.049602GBM
GBM_grid__1_AutoML_20210720_110510_model_2 0.786609 0.2885060.183858 0.2975280.2871530.0824567 288 0.04541 GBM
XGBoost_grid__1_AutoML_20210720_110510_model_6 0.78225 0.2503490.211155 0.2492810.2667560.071159 124 0.023677XGBoost
XGBoost_grid__1_AutoML_20210720_110510_model_2 0.781446 0.2582670.230984 0.2821650.2711740.0735355 170 0.029304XGBoost
XGBoost_grid__1_AutoML_20210720_110510_model_5 0.771542 0.2568910.197349 0.2698070.2702260.0730219 151 0.015605XGBoost
StackedEnsemble_BestOfFamily_AutoML_20210720_1105100.76968 0.2565050.208264 0.3285930.2706080.0732289 394 0.093196StackedEnsemble
XGBoost_grid__1_AutoML_20210720_110510_model_1 0.76858 0.328 0.193323 0.2957080.2954340.0872812 151 0.019162XGBoost
GBM_1_AutoML_20210720_110510 0.767564 0.2740640.184669 0.2905450.2785760.0776043 300 0.01698 GBM
XGBoost_grid__1_AutoML_20210720_110510_model_4 0.766887 0.3108730.1851 0.3096750.2890330.0835403 150 0.033728XGBoost
XGBoost_3_AutoML_20210720_110510 0.766548 0.2711050.183988 0.2863550.2802310.0785293 141 0.019065XGBoost
XGBoost_grid__1_AutoML_20210720_110510_model_8 0.765913 0.2592830.172262 0.3411630.2693540.0725518 103 0.012563XGBoost
GBM_grid__1_AutoML_20210720_110510_model_4 0.757872 0.2589010.177716 0.2457250.2715760.0737535 200 0.018967GBM
GBM_5_AutoML_20210720_110510 0.757237 0.2540460.182285 0.2698070.2670410.0713109 155 0.017601GBM
GBM_grid__1_AutoML_20210720_110510_model_3 0.755502 0.2550880.184569 0.26422 0.2666860.0711213 224 0.017665GBM
XGBoost_grid__1_AutoML_20210720_110510_model_7 0.748773 0.3072670.178202 0.2698070.2919930.0852597 153 0.015149XGBoost
XGBoost_grid__1_AutoML_20210720_110510_model_9 0.748138 0.2715880.174836 0.3112830.2727890.0744139 86 0.013119XGBoost
StackedEnsemble_AllModels_AutoML_20210720_110510 0.746657 0.2677510.161574 0.3206370.27363 0.0748732 365 0.10245 StackedEnsemble
GBM_2_AutoML_20210720_110510 0.737176 0.2623560.197863 0.28754 0.2697360.0727577 197 0.016535GBM
GBM_3_AutoML_20210720_110510 0.732013 0.26938 0.165151 0.2865670.2751740.0757208 275 0.016725GBM
XGBoost_grid__1_AutoML_20210720_110510_model_3 0.732013 0.26124 0.179714 0.2770020.2691780.0724568 102 0.015415XGBoost

Confusion Matrix

Confusion matrix shows a predicted class vs an actual class.

GBM_grid__1_AutoML_20210720_110510_model_1

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2557828965438706: 
false true Error Rate
0 false 351.0 7.0 0.0196 (7.0/358.0)
1 true 8.0 25.0 0.2424 (8.0/33.0)
2 Total 359.0 32.0 0.0384 (15.0/391.0)

Variable Importance

The variable importance plot shows the relative importance of the most important variables in the model.

Variable Importance Heatmap

Variable importance heatmap shows variable importance across multiple models. Some models in H2O return variable importance for one-hot (binary indicator) encoded versions of categorical columns (e.g. Deep Learning, XGBoost). In order for the variable importance of categorical columns to be compared across all model types we compute a summarization of the the variable importance across all one-hot encoded features and return a single variable importance for the original categorical feature. By default, the models and variables are ordered by their similarity.

Model Correlation

This plot shows the correlation between the predictions of the models. For classification, frequency of identical predictions is used. By default, models are ordered by their similarity (as computed by hierarchical clustering). Interpretable models, such as GAM, GLM, and RuleFit are highlighted using red colored text.

SHAP Summary

SHAP summary plot shows the contribution of the features for each instance (row of data). The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function.

Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.






Confusion Matrix

Confusion matrix shows a predicted class vs an actual class.

GBM_grid__1_AutoML_20210720_110510_model_1

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2557828965438706: 
false true Error Rate
0 false 351.0 7.0 0.0196 (7.0/358.0)
1 true 8.0 25.0 0.2424 (8.0/33.0)
2 Total 359.0 32.0 0.0384 (15.0/391.0)

Variable Importance

The variable importance plot shows the relative importance of the most important variables in the model.

SHAP Summary

SHAP summary plot shows the contribution of the features for each instance (row of data). The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function.

Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.





Checking whether there is an H2O instance running at http://localhost:54321 . connected.
H2O_cluster_uptime: 3 mins 16 secs
H2O_cluster_timezone: Etc/UTC
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.32.1.4
H2O_cluster_version_age: 11 days
H2O_cluster_name: H2O_from_python_unknownUser_0tvhwr
H2O_cluster_total_nodes: 1
H2O_cluster_free_memory: 3.163 Gb
H2O_cluster_total_cores: 2
H2O_cluster_allowed_cores: 2
H2O_cluster_status: locked, healthy
H2O_connection_url: http://localhost:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.7.11 final
Parse progress: |█████████████████████████████████████████████████████████| 100%
AutoML progress: |████████████████████████████████████████████████████████| 100%

Leaderboard

Leaderboard shows models with their metrics. When provided with H2OAutoML object, the leaderboard shows 5-fold cross-validated metrics by default (depending on the H2OAutoML settings), otherwise it shows metrics computed on the frame. At most 20 models are shown by default.
model_id mean_residual_deviance rmse mse mae rmsle training_time_ms predict_time_per_row_msalgo
StackedEnsemble_AllModels_AutoML_20210720_110822 17633.2132.79 17633.2 88.47140.163249 271 0.147336StackedEnsemble
StackedEnsemble_BestOfFamily_AutoML_20210720_110822 17933.6133.91617933.6 90.434 0.164036 144 0.093224StackedEnsemble
GBM_grid__1_AutoML_20210720_110822_model_1 26298.8162.16926298.8112.076 0.175313 424 0.061966GBM
XGBoost_grid__1_AutoML_20210720_110822_model_4 26312.2162.21 26312.2114.551 0.153739 180 0.020614XGBoost
XGBoost_grid__1_AutoML_20210720_110822_model_3 26845.6163.84626845.6116.373 0.144884 236 0.006608XGBoost
XGBoost_grid__1_AutoML_20210720_110822_model_5 33015.2181.70133015.2129.204 0.125457 677 0.016865XGBoost
XGBoost_grid__1_AutoML_20210720_110822_model_9 36505.7191.06536505.7132.228 0.156883 177 0.003776XGBoost
XGBoost_grid__1_AutoML_20210720_110822_model_2 40280.3200.7 40280.3144.697 0.155139 663 0.014117XGBoost
XGBoost_3_AutoML_20210720_110822 40660.6201.64540660.6145.186 0.17298 166 0.011042XGBoost
XGBoost_1_AutoML_20210720_110822 40680 201.69340680 143.499 0.150462 830 0.018056XGBoost
XGBoost_grid__1_AutoML_20210720_110822_model_1 45457.6213.20845457.6148.708 0.18636 479 0.010564XGBoost
XGBoost_grid__1_AutoML_20210720_110822_model_8 46342.1215.27246342.1158.695 0.168022 595 0.019797XGBoost
XGBoost_grid__1_AutoML_20210720_110822_model_6 47555.9218.07347555.9156.613 0.182885 154 0.010723XGBoost
GBM_1_AutoML_20210720_110822 49826.3223.21849826.3161.259 0.185001 280 0.021325GBM
XGBoost_2_AutoML_20210720_110822 50881.3225.56950881.3161.075 0.19397 714 0.01349 XGBoost
GBM_3_AutoML_20210720_110822 53412.8231.11253412.8157.073 0.203502 260 0.012073GBM
GBM_grid__1_AutoML_20210720_110822_model_2 61496.6247.98561496.6178.095 0.190502 461 0.035649GBM
DRF_1_AutoML_20210720_110822 73276.5270.69673276.5182.53 0.204566 303 0.021696DRF
GBM_4_AutoML_20210720_110822 75087 274.02 75087 188.544 0.222515 265 0.022493GBM
XGBoost_grid__1_AutoML_20210720_110822_model_7 80016.5282.87280016.5193.155 0.209027 178 0.022893XGBoost

Residual Analysis

Residual Analysis plots the fitted values vs residuals on a test dataset. Ideally, residuals should be randomly distributed. Patterns in this plot can indicate potential problems with the model selection, e.g., using simpler model than necessary, not accounting for heteroscedasticity, autocorrelation, etc. Note that if you see "striped" lines of residuals, that is an artifact of having an integer valued (vs a real valued) response variable.

Variable Importance

The variable importance plot shows the relative importance of the most important variables in the model.

Variable Importance Heatmap

Variable importance heatmap shows variable importance across multiple models. Some models in H2O return variable importance for one-hot (binary indicator) encoded versions of categorical columns (e.g. Deep Learning, XGBoost). In order for the variable importance of categorical columns to be compared across all model types we compute a summarization of the the variable importance across all one-hot encoded features and return a single variable importance for the original categorical feature. By default, the models and variables are ordered by their similarity.

Model Correlation

This plot shows the correlation between the predictions of the models. For classification, frequency of identical predictions is used. By default, models are ordered by their similarity (as computed by hierarchical clustering). Interpretable models, such as GAM, GLM, and RuleFit are highlighted using red colored text.

SHAP Summary

SHAP summary plot shows the contribution of the features for each instance (row of data). The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function.

Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.





Individual Conditional Expectation

An Individual Conditional Expectation (ICE) plot gives a graphical depiction of the marginal effect of a variable on the response. ICE plots are similar to partial dependence plots (PDP); PDP shows the average effect of a feature while ICE plot shows the effect for a single instance. This function will plot the effect for each decile. In contrast to the PDP, ICE plots can provide more insight, especially when there is stronger feature interaction.





Residual Analysis

Residual Analysis plots the fitted values vs residuals on a test dataset. Ideally, residuals should be randomly distributed. Patterns in this plot can indicate potential problems with the model selection, e.g., using simpler model than necessary, not accounting for heteroscedasticity, autocorrelation, etc. Note that if you see "striped" lines of residuals, that is an artifact of having an integer valued (vs a real valued) response variable.

Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.















Individual Conditional Expectation

An Individual Conditional Expectation (ICE) plot gives a graphical depiction of the marginal effect of a variable on the response. ICE plots are similar to partial dependence plots (PDP); PDP shows the average effect of a feature while ICE plot shows the effect for a single instance. This function will plot the effect for each decile. In contrast to the PDP, ICE plots can provide more insight, especially when there is stronger feature interaction.















Checking whether there is an H2O instance running at http://localhost:54321 . connected.
H2O_cluster_uptime: 1 min 53 secs
H2O_cluster_timezone: Etc/UTC
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.32.1.5
H2O_cluster_version_age: 2 days
H2O_cluster_name: H2O_from_python_unknownUser_hndmmi
H2O_cluster_total_nodes: 1
H2O_cluster_free_memory: 3.172 Gb
H2O_cluster_total_cores: 2
H2O_cluster_allowed_cores: 2
H2O_cluster_status: locked, healthy
H2O_connection_url: http://localhost:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.7.11 final
Parse progress: |█████████████████████████████████████████████████████████| 100%
AutoML progress: |████████████████████████████████████████████████████████| 100%

Residual Analysis

Residual Analysis plots the fitted values vs residuals on a test dataset. Ideally, residuals should be randomly distributed. Patterns in this plot can indicate potential problems with the model selection, e.g., using simpler model than necessary, not accounting for heteroscedasticity, autocorrelation, etc. Note that if you see "striped" lines of residuals, that is an artifact of having an integer valued (vs a real valued) response variable.

Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.




















Individual Conditional Expectation

An Individual Conditional Expectation (ICE) plot gives a graphical depiction of the marginal effect of a variable on the response. ICE plots are similar to partial dependence plots (PDP); PDP shows the average effect of a feature while ICE plot shows the effect for a single instance. This function will plot the effect for each decile. In contrast to the PDP, ICE plots can provide more insight, especially when there is stronger feature interaction.